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Abstract 


Shareable data services providing consistency guarantees, such as atomicity (linearizability), make building distributed 
systems easier. However, combining linearizability with efficiency in practical algorithms is difficult. A reconfigurable 
linearizable data service, called RAMBO, was developed by Lynch and Shvartsman. This service guarantees consistency 
under dynamic conditions involving asynchrony, message loss, node crashes, and new node arrivals. The specification 
of the original algorithm is given at an abstract level aimed at concise presentation and formal reasoning about 
correctness. The algorithm propagates information by means of gossip messages. If the service is in use for a 
long time, the size and the number of gossip messages may grow without bound. This paper presents a consistent 
data service for long-lived objects that improves on RAMBO in two ways: it includes an incremental communication 
protocol and a leave service. The new protocol takes advantage of the local knowledge, and carefully manages the 
size of messages by removing redundant information, while the leave service allows the nodes to leave the system 
gracefully. The new algorithm is formally proved correct by forward simulation using levels of abstraction. An 
experimental implementation of the system was developed for networks-of-workstations. The paper also includes 
selected analytical and preliminary empirical results that illustrate the advantages of the new algorithm. 


1 Introduction 


This paper presents a practical algorithm implementing long-lived, survivable, atomic read/write objects 
in dynamic networks, where participants may join, leave, or fail during the course of computation. The 
survivability of data is ensured through redundancy: the data is replicated and maintained at several network 
locations. Replication introduces the challenges of maintaining consistency among the replicas, and managing 
dynamic participation as the collections of network locations storing the replicas change due to arrivals, 
departures, and failures of nodes. 

A new approach to implementing atomic read/write objects for dynamic networks was developed by Lynch 
and Shvartsman [1] and extended by Gilbert et al. [2]. They developed a memory service called RAMBO (Re- 
configurable Atomic Memory for Basic Objects) that maintains atomic, a.k.a linearizable, readable/writable 
data in highly dynamic environments. In order to achieve availability in the presence of failures, the objects 
are replicated at several network locations. In order to maintain consistency in the presence of small and 
transient changes, the algorithm uses configurations of locations, each of which consists of a set of members 
plus sets of read-quorums and write-quorums. In order to accommodate larger and more permanent changes, 
the algorithm supports reconfiguration, by which the set of members and the sets of quorums are modified. 
Any configuration may be installed at any time. Obsolete configurations can be removed from the system 
without interfering with the ongoing read and write operations. The algorithm tolerates arbitrary patterns 
of asynchrony, node crashes, and message loss. It is formally shown [2, 1] that atomicity is maintained 
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in any execution of the algorithm. We developed an experimental implementation of this algorithm on a 
network-of-workstations [3]. 

The original RAMBO algorithm is formulated at an abstract level aimed at concise specification and formal 
reasoning about the algorithm’s correctness. Consequently the algorithm incorporates a simple communi- 
cation protocol that maintain very little protocol state. Nevertheless, showing correctness requires careful 
arguments involving subtle race conditions [2, 1]. The algorithm propagates information among the partic- 
ipants by means of gossip messages that contain information representing the sender’s state. The number 
and the size of gossip messages may in fact grow without bound. This renders the algorithm impractical for 
use in long-lived applications. 

The gossip messages in RAMBO include the set of participants, and the size of these messages increases 
over time for two reasons. First, RAMBO allows new participants to join the computation, but it does not 
allow the participants to leave gracefully. In order to leave the participants must pretend to crash. Given 
that in asynchronous systems failure detection is difficult, it may be impossible to distinguish departed nodes 
from the nodes that crash. Second, RAMBO gossips information among the participants without regard for 
what may already be known at the destination. Thus a participant will repeatedly gossip substantial amount 
of information to others even if it did not learn anything new since the last time it gossiped. While such 
redundant gossip helps tolerating message loss, it substantially increases the communication burden. Given 
that the ultimate goal for this algorithm is to be used in long-lived applications, and in dynamic networks 
with unknown and possibly infinite universe of nodes, the algorithm must be carefully refined to substantially 
improve its communication efficiency. 


Contributions. The paper presents a new algorithm for reconfigurable atomic memory for dynamic net- 
works. The algorithm, called LL-RAMBO, makes implementing atomic survivable objects practical in long- 
lived systems by managing the knowledge accumulated by the participants and the size of the gossip messages. 
Each participating node maintains a more complicated protocol state and, with the help of additional local 
processing, this investment is traded for substantial reductions in the size and the number of gossip messages. 
Based on [2, 1], we use Input/Output Automata (IOA) [4] to specify the algorithm, then prove it correct 
in two stages by forward simulation, using levels of abstraction. We include analytical and preliminary 
empirical results illustrating the advantages of the new algorithm. In more detail, our contributions are as 
follows. 


1. We develop L-RAMBO that implements an atomic memory service and includes a leave service (Sect. 3). 
We prove correctness (safety) of L-RAMBO by forward simulation of RAMBO, hence we show that every 
trace of L-RAMBO is a trace of RAMBO. 


2. We develop LL-RAMBO by refining L-RAMBO to implement incremental gossip (Sect. 4). We prove 
that LL-RAMBO implements the atomic service by forward simulation of L-RAMBO. This shows that 
every trace of LL-RAMBO is a trace of L-RAMBO, and thus a trace of RAMBO. The proof involves 
subtle arguments relating the knowledge extracted from the local state to the information that is not 
included in gossip messages. We present the proof in two steps for two reasons: (7) the presentation 
matches the intuition that the leave service and the incremental gossip are independent, and (7) the 
resulting proof is simpler than a direct simulation of RAMBO by LL-RAMBO. 


3. We show (Sect. 5) that LL-RAMBO consumes smaller communication resources than RAMBO, while 
preserving the same read and write operation latency, which under certain steady-state assumptions 
is at most 8d time, where d is the maximum message delay unknown to the algorithm. Under these 
assumptions, in runs with periodic gossip, LL-RAMBO achieves substantial reductions in communica- 
tion. 


4. We implemented all algorithms on a network-of-workstations. Preliminary empirical results comple- 
ment the analytical comparison of the two algorithms (Sect. 6). 


Background. Several approaches can be used to implement consistent data in (static) distributed systems. 
Starting with Gifford [5] and Thomas [6], many algorithms used collections of intersecting sets of objects 
replicas to solve the consistency problem. Upfal and Wigderson [7] use majority sets to emulate shared 


memory. Vitdnyi and Awerbuch [8] use matrices of registers where the rows and the columns are written and 
respectively read by specific processors. Attiya, Bar-Noy and Dolev [9] use majorities to implement shared 
objects in static message passing systems. Extension with reconfigurable quorums have been explored [10, 11]. 
These systems have limited ability to support long-lived data when the longevity of processors is limited. 
Virtual synchrony [12], and group communication services (GCS) in general [13], can be used to implement 
consistent objects, e.g., by using a global totally ordered broadcast. The universe of nodes in a GCS can 
evolve, however forming a new view is indicated after a single failure and can take a substantial time, while 
reads and writes are delayed during view formation. In our algorithm, as in [1, 2], reads and writes can make 
progress during reconfiguration. In the current approach, arbitrary new configurations can be introduced. 
This yields a more dynamic system compared to [14, 15, 16, 17, 18] that would require that some new 
quorums include nodes from the old quorums, thus restricting the choice of the new configuration through 
the static constraints that need to be satisfied even before the reconfiguration. 

The work on reconfigurable atomic memory [10, 
1, 2] results in algorithms that are more dynamic 
because they place fewer restrictions on the choice 
of new configurations and allow for the universe of 
processors to evolve arbitrarily. However these ap- 
proaches are based on abstract communication pro- 
tocols that are not suited for long-lived systems. In 
this paper we provide a long-lived solutions by in- 
troducing graceful processor departures and incre- 
mental gossip. The idea of incrementally propagat- 
ing information among participating nodes has been 
previously used in a variety of different settings, e.g., : 
[19, 20, 21, 22, 23, 24, 25]. Incremental gossip is also a 
called anti-entropy [26, 27] or reconciliation [28]; these concepts are used in database replication algorithms, 
however due to the nature of the application they assume stronger assumptions, e.g., ordering of messages. 
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Fig. 1: RAMBO architecture depicting automata at 
nodes 7 and j, the channels, and the Recon 


Document structure. In Section 2 we review RAMBO. In Section 3 we specify and prove correct the 
graceful leave service. Section 4 presents the ultimate system, with leave and incremental gossip, and proves 
it correct. In Section 5 we present the analytical performance analysis, and in Section 6 we demonstrate the 
experimental results. We conclude in Section 7. A preliminary version of this paper appears in [29]. 


2 Reconfigurable Atomic Memory for Basic Objects (RAMBO) 


We now describe the RAMBO algorithm as presented in [1], including the rapid configuration upgrade as given 
in [2]. The algorithm is given for a single object (atomicity is preserved under composition, and multiple 
objects can be composed to yield a complete shared memory). For the detailed Input/Output Automata 
code see Appendix A or [1, 2]. In order to achieve fault tolerance and availability, RAMBO replicates 
objects at several network locations. In order to maintain memory consistency in the presence of small and 
transient changes, the algorithm uses configurations, each of which consists of a set of members plus sets 
of read-quorums and write-quorums. The quorum intersection property requires that every read-quorum 
intersect every write-quorum. In order to accommodate larger and more permanent changes, the algorithm 
supports reconfiguration, by which the set of members and the sets of quorums are modified. Any quorum 
configuration may be installed, and atomicity is preserved in all executions. 

The algorithm consists of three kinds of automata: (i) Joiner automata, handling join requests, (i) Recon 
automata, handling reconfiguration requests (recon) and generating a totally ordered sequence of configura- 
tions, and (iii) Reader-Writer automata, handling read and write requests, manage configuration upgrades, 
and implement gossip messaging. The overall systems is the composition of these automata with the au- 
tomata modelling point-to-point communication channels, see Fig. 1. 

The Joiner automaton is quite straightforward, simply sending a join message when node 7 joins, and 
sending a join-ack message whenever a Join message is received. The Recon automaton establishes a total 
ordering of configurations; it is implemented using a consensus service (e.g., Paxos [30]), or it can be a much 


simpler service when the set of possible configurations is finite [31]. Here we assume that a total ordering 
exists, and we need not discuss this further (for details see [1]). 


Domains: 
I, a set of processes C, a set of configurations, each consisting of members, 
V, a set of legal values read-quorums, and write-quorums 
Input: Output: 
join(rambo, J);, J a finite subset of I — {i}, 7 € J, join-ack(rambo);, 7 € I 
such that if i = io then J = 0 read-ack(v);, vE V,i EL 
read;, «EL write-ack;, i € I 
write(v);, vEV,tEL recon-ack(b);, b € {ok, nok},i € I 
recon(c,c’);, c,c’ € C, i € members(c), i € I report(c);, cE C,iel 
fail;, «ET 


Fig. 2: RamBo: External signature. 


The external signature of the service is in Fig. 2. A client at node i uses join, action to join the system. 
After receiving join-ack,, the client can issue read; and write; requests, which result in read-ack; and write-ack; 
responses. The client can issue a recon; request a reconfiguration. The fail; action models a crash at node i. 
Remark. The fail; action is an input action coming from the environment. In the original RAMBO it is 
used solely to signal the crash of node 7. In this paper we introduce a new kind of action coming from the 
environment—the leave; action. This is discussed in detail in the next section. In showing correctness of the 
new algorithm, we assume that fail; and leave; are synonymous: both come from the environment and both 
result in node 7 ceasing to participate in the computation. Thus we model the graceful departures as a form 
of failure. Kramer. 


Every node of the system maintains a tag and a value for the data object. Every time a new value is 
written, it is assigned a unique tag, with ties broken by process-ids. These tags are used to determine an 
ordering of the write operations, and therefore determine the value that a read operation should return. Read 
and write operations have two phases, query and propagation, each accessing certain quorums of replicas. 
Assume the operation is initiated at node i. First, in the query phase, node 7 contacts read quorums to 
determine the most recent known tag and value. Then, in the propagation phase, node i contacts write 
quorums. If the operation is a read operation, the second phase propagates the largest discovered tag and 
its associated value. If the operation is a write operation, node i chooses a new tag, strictly larger than 
every tag discovered in the query phase. Node 7 then propagates the new tag and the new value to a write 
quorum. Note that every operation accesses both read and write quorums. 

Configurations go through three stages: proposal, installation, and upgrade. First, a configuration is 
proposed by a recon event. Next, if the proposal is successful, the Recon service achieves consensus on the 
new configuration, and notifies participants with decide events. When every non-failed member of the prior 
configuration has been notified, the configuration is installed. The configuration is upgraded when every 
configuration with a smaller index has been removed. Upgrades are performed by the configuration upgrade 
operations. Each upgrade operation requires two phases, a query phase and a propagate phase. The first 
phase contacts a read-quorum and a write-quorum from the old configurations, and the second phase contacts 
a write-quorum from the new configuration. All three operations, read, write, and configuration upgrade, are 
implemented using gossip messages. 

The cmap is a mapping from integer indices to configurations U{_L, +}, initially mapping every integer 
to L. It records which configurations are active, which have not yet been created, indicated by L, and which 
have already been removed, indicated by +. The total ordering on configurations determined by Recon 
ensures that all nodes agree on which configuration is stored in each position in cmap. We define c(k) to be 
the configuration associated with index k. 

The record op is used to store information about the current phase of an ongoing read or write operation, 
while upg is used for information about an ongoing configuration upgrade operation. A node can process 
read and write operations concurrently with configuration upgrade operations. The op.cmap subfield records 
the configuration map associated with the operation. For read or write operations this consists of the node’s 
cmap when a phase begins, augmented by any new configurations discovered during the phase. A phase 
completes when the initiator has exchanged information with quorums from every valid configuration in 
op.cmap. The pnum subfield records the phase number when the phase begins, allowing the initiator to 


determine which responses correspond to the phase. The acc subfield accumulates the ids of the nodes from 
which quorums have responded during the current phase. 


Remark. RAMBO uses monotonically increasing phase numbers to detect fresh responses. The new phase 
number is computed by incrementing the previous phase number. Here we assume that new phase numbers 
are computed by adding an arbitrary positive integer to the previous phase number. This is sufficient for the 
correctness of RAMBO, while this substantially simplifies the proof of correctness of our algorithm. Kramer. 


Finally, the nodes communicate via point-to-point channels with the following properties: (1) messages 
are not corrupted, (2) messages may be lost, (3) messages may be delivered in an arbitrary order, (4) messages 
may be duplicated, and (5) messages are not spontaneously generated by the channels, which means that 
for each receive event there is a corresponding send event. We denote by Channel;,; the channel from node 
i to node j. 


3 RAMBO with Graceful Leave 


Here we augment RAMBO with a leave service allowing the participants to depart gracefully. We prove that 
the new algorithm, called L-RAMBO, implements atomic memory. 

Nodes participating in RAMBO communicate by means of gossip messages containing the latest object 
value and bookkeeping information that includes the set of known participants. RAMBO allows participants 
to fail or leave without warning. Since in asynchronous systems it is difficult or impossible to distinguished 
slow or departed nodes from crashed nodes, RAMBO implements gossip to all known participants, regardless 
of their status. In highly dynamic systems this leads to (a) the size of gossip messages growing without 
bounds, and (b) the number of messages sent in each round of gossip increasing as new participants join the 
computation. 

L-RAMBO allows graceful node departures by letting a node that wishes to leave the system to send 
notification messages to an arbitrary subset of known participants. When another node receives such notifi- 
cation, it marks the sender as departed, and stops gossiping to that node. The remaining nodes propagate 
the information about the departed nodes to other participants, eventually eliminating gossip to nodes that 
departed gracefully. 


Specification of L-RAMBO. We interpret the fail; event as synonymous with the leave; event—both are 
inputs from the environment and both result in node 7 stopping to participate in all operations. The difference 
between fail; and leave; is strictly internal: leave; allows a node to leave gracefully. The well-formedness 
conditions of RAMBO and the specifications of Joiner; and Recon remain unchanged. The introduction of 
the leave service affects only the specification of the Reader- Writer; automata. These changes for L-RAMBO 
are given in Fig. 3, except for the segments of code that should be disregarded until the ultimate long- 
lived algorithm LL-RAMBO is presented in Section 4 (we combine the two specifications to avoid unnecessary 
repetition and simplify presentation). 

The signature of Reader- Writer; automaton is extended with actions recv(leave) ; ; and send(leave); ; used 
to communicate the graceful departure status. The state of Reader-Writer; is extended with new state 
variables: departed,, the set of nodes that left the system, as known at node 2, leave-world;, the set of nodes 
that node i can inform of its own departure, once it decides to leave and sets leave-world; to world — departed. 

The key algorithmic changes involve the actions recv(m), ; and send(m); ;. The original RAMBO algorithm 
gossips message m includes: W the world of the sender, v the object and its tag t, cm the cmap, pns the 
phase number of the sender, and pnr the phase number of the receiver that is known to the sender. The 
gossip message m in L-RAMBO also includes D, a new parameter, initialized to the departed set of the sender. 

We now detail the leave protocol. Assume that nodes 7 and j participate in the service, and node i wishes 
to depart following the leave; event, whose effects set the state variable failed; to true in Joiner;, Recon;, and 
Reader-Writer;. The leave; action at Reader- Writer; (see Fig. 3) also initializes the set leave-world; to the 
identifiers found in world;, less those found in departed,. Now Reader- Writer; is allowed to send one leave 
notification to any node in leave-world;. This is done by the send(leave);,; action that arbitrarily chooses 
the destination j from leave-world;. Note that node i may nondeterministically choose the original fail; 
action (see Appendix A or [1]), in which case no notification messages are sent (this is the “non-graceful” 
departure). 


Signature: State: 
As in RAMBO, plus new actions: 
Input : leave;, recv(leave) ; , 
Output : send(leave); ; 


Transitions at i: 
Input recv((W, D, v, t, em, pns, pnr));,i 
Effect: 
if —failed A status # idle then 
status <— active 
world — world UW 
departed — departed U D 


[h]hrecu-W (j,i, pnr) — W 
[h]hrecv-D(j, i, pnr) — D 
ig(j).w-known — ig(j).w-known U W 
ig(j).w-unack — ig(j).w-unack — W 
ig(j).d-known <— ig(j).d-known U D 
ig(j).d-unack — ig(j).d-unack — D 


if pnr > ig(j).p-ack then 
ig(j).w-known — ig(j).w-known U ig(j).w-unack 
ig(j).w-unack — world — ig(j).w-known 
ig(j).d-known — ig(j).d-known U ig(j).d-unack 
ig(j).d-unack — departed — ig(j).d-known 
ig(j).p-ack — pnum1 


if t > tag then (value, tag) <— (v,t) 
cmap — update(cmap, cm) 
pnum2(j) — max(pnum2(j), pns) 
if op.phase € {query, prop} pnr > op.pnum then 
op.cmap —— extend(op.cmap, truncate(cm)) 
if op.cmap € Truncated then 
op.acc — op.acc U {j} 
else 
pnuml — pnumi +1 
op.acc — 
op.cmap — truncate(cmap) 
if upg.phase € {query, prop} \pnr > upg.pnum then 
upg.acc — upg.acc U {j} 


input recv(leave) ; |; 
Effect: 
if sfailed A status = active then 
departed — departed U {j} 


As in RAMBO, plus new states: 
leave-world, a finite subset of I, initially 0 
departed, a finite subset of J, initially 0 


.w-known =, ig(k).w-unack =, 


.d-known =0, ig(k).d-unack =, 


Output send((W, D, v, t,cm, pns, pnr))i,; 
Precondition: 

afailed 

status = active 

j € (world — departed) 

(W, D,v,t,cm, pns, pnr) = 


(world —ig(j).w-known|, departed| —ig(j).d-known|, 


value, tag, cmap, pnum1, pnum2 (7 
Effect: 
pnumil — pnumi+1 
hlh-msg <— h-msgU 
((W, D, v,t, em, pns, pnr), i,j) 


hs-world(4, j, pns) — world 
hs-departed(i,j, pns) — departed 
hs-wknow(i,j, pns) — ig(j).w-known 
hs-dknow(i, j, pns) — ig(j).d-known 


hs-wunack(i,j, pns) — ig(j).w-unack 
hs-dunack(i,j, pns) — ig(j).d-unack 
hs-pack(i,j, pns) — ig(j).p-ack 


input leave; 
Effect: 
if —failed then 
failed — true 
departed — departed U {i} 
leave-world — world — departed 


output send (leave) ;, ; 
Precondition: 
j € leave-world 
Effect: 
leave-world — leave-world — {j} 


Fig. 3: Modification of Reader- Writer; for L-RAMBO, and for LL-RAMBO (the code). 


When Reader-Writer; receives a leave notification from node 7, it adds 7 to its departed; set. Node 
i sends gossip messages to all nodes in the set world; — departed,;, which including information about j’s 
departure. When Reader- Writer; receives a gossip message that includes the set D, it updates its departed, 
set accordingly. 


Atomicity of L-RAMBO service. The L-RAMBO system is the composition of all Reader- Writer; and Joiner; 
automata, the Recon service, and Channel;,; automata for all 7,7 € J. We show atomicity of L-RAMBO by 
forward simulation that proves that any trace of L-RAMBO is also a trace of RAMBO, and thus L-RAMBO 
implements atomic objects. The proof uses history variables, annotated with the symbol [h] in Fig. 3. 

For each 7 we define h-msg,; to be the history variable that keeps track of all messages sent by Reader- 
Writer; automata. Initially, h-msg; = @ for all i € J. Whenever a message m is sent by 7 to some node 7 € I 
via Channel;,;, we let h-msg; — h-msg; U {(m,i, j)}. We define h-MSG to be Uj;<, h-msg;. (The remaining 
history variables are used in reasoning about LL-RAMBO, in Section 4). 

The following lemma states that only good messages are sent. 


Lemma 3.1: In any execution of L-RAMBO, if m is a message received by node i in a recv(m),,; event, 
then (m, j,i) € h-MSG, and m € {(W, D,v,t, cm, pns, pnr), leave, join}, where (W, D,v,t,cm,pns,pnr) € 
IxIxVxTxCmapxN xN. 


Proof. By the definition of Channel,;, the messages are not corrupted, and for every receive there exists 
a preceding send event (messages are not manufactured by the channel). Therefore, m must have been 


sent by some node j € J in some earlier event of the execution. Hence (m,j,7) € h-msg,;; by definition, 
(m,j,t) € h-MSG. Messages are sent only in Reader- Writer; automaton’s send((W, D, v,t, cm, pns, pnr)) i 
or send(leave) ; ; events, or in Joiner; automaton’s send(join);,; event, and by the code of these events m = 
(W, D,v,t,cm, pns, pnr), m = leave, or m = join. 


Definition 3.2 and Theorem 3.3 are used to show the safety of L-RAMBO. 


Definition 3.2 ([32]): Let A and S be two automata with the same external signature. A forward simulation 
from A to S is a relation R C states(A) x states(.S) that satisfies the following two conditions: 


1. If t is any initial state of A, then there is an initial state s of S such that s € R(t), where R(t) is an 
abbreviation for {s: (t,s) € R}. 


2. If t and s € R(t) are reachable states of A and S respectively, and if (t,7,t’) is a step of A, then there 
exists an execution fragment of S from s to some s’ € R(t’), having the same trace as step (t, 7, t’). 


Theorem 3.3 ([32]): If there is a simulation from automaton A to automaton S$, then traces(A) C traces(S). 


Next we show that L-RAMBO implements RAMBO, assuming the environment behavior as (informally) 
described in Sect. 2. Showing well-formedness is straightforward by inspecting the code. The proof of 
atomicity is based on a forward simulation from L-RAMBO to RAMBO. 


Theorem 3.4: L-RAMBO implements atomic read/write objects. 


Proof. We show that L-RAMBO algorithm simulates RAMBO. Specifically, we show that there exists a 
simulation relation R from L-RAMBO to RAMBO that satisfies Definition 3.2. Observe that L-RAMBO and 
RAMBO have the same external signatures. 

We denote by MSG, the set of messages in the channel automata of RAMBO and by MSG, the set 
of messages in the channel automata of L-RAMBO. 

We define the simulation relation R to map: 


(a) a state t of L-RAMBO to a state s of RAMBO so that every “common” state variable has the same 
value. For example, for node i € J, t.world; = s.world;, t-pnum1; = s.pnuml;, t.cmap; = s.cmap;, 
etc. 


(8) a message m = (W,D,v,t,cm,pns,pnr) € Channel,;.MSGi, to a message m!’ = 
(W,v,t,em, pns, pnr) € Channel;,;.MSGaa so that: (i) m.v = m’.v, (ii) mt = mt, (iii) mem = 
m’.cm, (iv) m.pns = m'.pns, (v) m.pnr =m! .pnr, and (vi) m.W = m’.W. 


Recall that the difference between the two algorithms is in the Reader-Writer automata. Therefore, in 
order to show that L-RAMBO simulates RAMBO, we focus only on transitions related to the Reader- Writer 
automata. (We remind the reader that the detailed code of RAMBO is given in Appendix A.) We now show 
that R satisfies Definition 3.2: 


1. Iftis an initial state of L-RAMBO then there exists an initial state s of RAMBO such that s € R(t), since 
all common state variables have the same initial values. For example, Vi € I, t.world; = 0 = s.world;, 
t.pnum1l; =0= s.pnuml,, etc. 


2. Suppose ¢ and s are reachable states of L-RAMBO and RAMBO respectively such that s € R(t) and 
that (t, 7,t’). We show that there exists a state s’ € R(t’) such that there is an execution fragment of 
RAMBO that has same trace as 7. 


(a) If m = send(m);,;, 7,9 € I, where m = (W,D,v,t,cm,pns,pnr) then let s’ be such that 
(s,send(m’);,;,’), where m’ = (W,v,t,cm,pns,pnr). Both actions have empty trace, since 
send(m);,; (resp. send(m’);,;) is considered internal with respect to the composition of automata 
that comprises automaton L-RAMBO (resp. RAMBO). 

From code of L-RAMBO we have that m = (W,D,v,t,cm,pns,pnr) is placed in the 
Channel;,;. MSGi, where m.W = t.world;, m.v = t.value;, m.cm = t.cmap;, m.pns = t.pns;, 


and m.pnr = t.pnum2(j);. From the state correspondence, since send(m);,; is enabled, 
send(m’);,; is also enabled. Furthermore, send(m’);,; places m’ = (W,v,t,cm,pns,pnr) in the 
Channel;,;. MSG, where m’.W = s.world;, m’.v = s.value;, m’.cm = s.cmap;, m'.pns = s.pnsi, 
and m’.pnr = s.pnum2(j);. From the state correspondence of R for t and s we conclude that 
the message correspondence of R for ¢’ and s’ is preserved. Also, the state correspondence for t’ 
and s’ is preserved, since t’.pnum1; = t._pnum1;+1 = s’.pnum1;+1 = s'.pnum|;, and all other 
common variables remain unchanged. 


(b) If m = recv(m);;, 1,9 € I, where m = (W, D,v,t,cm, pns, pnr). By Lemma 3.1, m € s.h-MSG. 

Let s’ be such that (s, recv(m’);,;, 8’), where m’ = (W,v,t, cm, pns, pnr). Both actions have empty 
trace, since recv(m);,; (resp. recv(m’),;;) is considered internal in the composition of automata 
that comprises automaton L-RAMBO (resp. RAMBO). 
Now, since m = (W,D,v,t,cm, pns,pnr) was in Channel;,;.MSGiz, by the message correspon- 
dence of R, message m’ = (W,v,t, cm, pns, pnr) is in Channel;,;,.MSGz, and m and m’ have the 
mapping defined above. By inspection of the code of the algorithms, the only difference between 
update performed by L-RAMBO and RAMBO is that L-RAMBO updates the departed, set remain- 
ing common variables undergo identical updates, hence the state correspondence for t’ and s’ is 
preserved. Also, the message correspondence is not affected, since no messages are sent. 


(c) If m = send(m);,;, 7,7 € I, where m = leave then let s’ be such that s = s’ and RAMBO does 
not perform any action. The action 7 in L-RAMBO has an empty trace since it is considered 
internal in the composition of automata that comprises automaton L-RAMBO. By the code of 
this action only leave-world; variable is updated, remaining common state variables of L-RAMBO 
are unchanged. Since RAMBO does not perform a step, its state variables are not modified. Hence 
the state correspondence for t’ and s’ is preserved. 


(d) If = recv(m);;, 1,9 € I, where m = leave. By Lemma 3.1, m € s.h-MSG. Let s’ be such 
that s = s’ and RAMBO does not perform any action. The action 7 in L-RAMBO has an empty 
trace since it is considered internal in the composition of automata that comprises automaton 
L-RAMBO. By the code of this action only departed, set is updated, remaining common state 
variables of L-RAMBO are unchanged. Since RAMBO does not perform a step, its state variables 
are not modified. Hence the state correspondence for t’ and s’ is preserved. 


(e) If 7 = leave;, i € J, then let s’ be such that (s, leave;, s’). By examination of the code the state 
variable failed, is set to true by both L-RAMBO and RAMBO, remaining common state variables 
are not changed. (Algorithm L-RAMBO initializes leave-world; with the identifiers found in the 
set {world; — departed; — {i}}.) Since, s’ failed; = true = t’.failed;, and all remaining common 
state variables are unchanged, the state correspondence for t’ and s’ is preserved. 

(f) If a is not one of the above actions, then we choose s’ such that (s,7,s’). That is, we simulate 
the same action. By inspection of the code of the algorithms it follows that any state change 
after a occurred is identical for both algorithms, hence the state correspondence and message 
correspondence given by R is preserved. 


Therefore, R is a simulation mapping from L-RAMBO to RAMBO per Definition 3.2. Thus, L-RAMBO 
simulates RAMBO. Since RAMBO implements atomic objects [1, 2], so does L-RAMBO. 


Finally, observe from the preconditions of send((W, D,v,t,cm,pns,pnr));; that if 7 € departed;, then 
send is disabled, i.e., once a node j learns that another node i left the system, 7 stops gossiping to 7. The 
correctness of the leave service is as follows: if node 7 is placed in the departed set of node 7, then 7 indeed 
departed the service. 


Theorem 3.5: For all states s of an execution of algorithm L-RAMBO, for any 7,7 € J, if 7 € s.departed,; then 
i € s.departed,. 


Proof. Proof is done by induction on the length of the execution. The base case holds trivially since all 
sets are empty in the initial state. Assume that the thesis of the lemma holds up to state s and consider 


step (s, 7,8’). Consider the case where 7 € s.departed,, 1,7 € I. By inspection of the code (see Fig. 3), we 


see that no identifier is ever removed from departed. Hence, we have that 7 € s’.departed ; and by inductive 
hypothesis, i € s.departed; and hence i € s’.departed; as desired. Let us now consider the more interesting 
case where i ¢ s.departed; and i € s'.departed;. That is, the case where j adds i in departed; during step 
(s, 7,8’). By examination of the code, there are only two actions that would make the above case possible: 


(a) m= recv(m);,;, 2,9 € I, where m = leave. By Lemma 3.1, m € s.h-MSG. Hence there is a preceding 
send(leave), ; event. By inspection of the code we observe that by the time the preconditions of the 
output send(leave) i,j are satisfied, the following is already true: i € departed;. Hence, we have that 
i € s'.departed, as desired. 


(b) m = recv(m)x5, k,j € I (k A 7), where m = (W,D,v,t,cm, pnr,pns), and i €¢ D. By Lemma 3.1, 
m € s.h-MSG. Hence there is a preceding send((W, D, v, t, cm, pnr, pns))x,; event, where i € D. By the 
preconditions of this action we have that 2 € §.departed,, for some state § < s. By inductive hypothesis, 
we have that i € §.departed,; and hence i € s’.departed, as desired. 


This completes the proof. 


4 RAMBO with Graceful Leave and Incremental Gossip 


Now we present, and prove correct, our ultimate algorithm, called LL-RAMBo (Long-Lived RAMBO). The 
algorithm is obtained by incorporating incremental gossip in L-RAMBO, so that the size of gossip messages is 
controlled by eliminating redundant information. In L-RAMBO (resp. RAMBO) the gossip messages contain 
sets corresponding to the sender’s world and departed (resp. world) state variables at the time of the sending 
(Fig. 3). As new nodes join the system and as participants leave the system, the cardinality of these sets 
grows without bound, rendering RAMBO and L-RAMBO impractical for implementing long-lived objects. 
The LL-RAMBO algorithm addresses this issue. The challenge here is to ensure that only the certifiably 
redundant information is eliminated from the messages, while tolerating message loss and reordering. 


Specification of LL-RAMBO. We specify the algorithm by modifying the code of L-RAMBO. In Fig. 3 
the segments of code specify these modifications. (The lines annotated with [h] in Fig. 3 deal with 
history variables that are used only in the proof of correctness.) The new gossip protocol allows node i to 
gossip the information in the sets world; and departed, incrementally to each node j € world; — departed,. 
Following j’s acknowledgment!, node i never again includes this information in the gossip messages sent to 
j, but will include new information that 7 has learned since the last acknowledgment by 7. 

To describe the incremental gossip in more detail we consider an exchange of a gossip messages between 
nodes 2 and 7, where 7 is the sender and j is the receiver. The sets world and departed are managed 
independently and similarly, and we illustrate incremental gossip using just the set world. First we define new 
data types. Let an incremental gossip identifier be the tuple (w-known, d-known, w-unack, d-unack, p-ack), 
where w-known, d-known, w-unack, and d-unack are finite subsets of J, and p-ack is a natural number. Let 
IG denote the set of all incremental gossip identifiers. Finally, let IGMap be the set of incremental gossip 
maps, defined as the set of mappings J — IG. We extend the state of the Reader- Writer; automaton with 
ig, € IGMap. Node i uses ig(j); tuple to keep track of the knowledge it has about the information already 
in possession of, and currently being propagated to, node j (see Fig.3). Specifically, for each j € world;, 
ig(j);-w-known is the set of node identifiers that 2 is assured is a subset of world,, ig(j);.w-unack is the set 
of node identifiers, a subset of world;, that j needs to acknowledge. The components ig(j);.d-known and 
ig(j);-d-unack are defined similarly for the departed set. Lastly, ig(j);.p-ack is the phase number of i when 
the last acknowledgment from j was received. Initially each of these sets is empty, and p-ack is zero for each 
ig(j); with j € I. 

Node 7 acknowledges a set of identifiers by including this set in the gossip message, or by sending a phase 
number of i such that node 7 can deduce that node j received this set of identifiers in some previous message 
from i to 7. Messages that include 7’s phase number that is larger than ig(j);.p-ack are referred to as fresh 
or acknowledgment messages, otherwise they are referred to as stale messages. (This is discussed later.) 


1 We note that this is not an explicit acknowledgment of a message, but some future message that contains information about 
what that node learned. 


In RAMBO, once node 7 learns about node j, it can gossip to 7 at any time. We now examine the 
send((W, D, v,t, cm, pns, pnr));,; action. The world component, W, is set to the difference of world; and the 
information that 7 knows that j has, ig(7);.w-known, at the time of the send. Remaining components of 
the gossip message are the same as in L-RAMBO. The effect of the send action causes phase number of the 
sender to increase; this ensures that each message sent is labeled with a unique phase number of the sender. 

Now we examine recv((W, D, v,t,cm, pns, pnr));,; action at j (note that we switch i and j relative to the 
code in Fig. 3 to continue referring to the interaction of the sender 7 and receiver 7). The component W con- 
tains a subset of node identifiers from j’s world. Hence W is always used to update world,, ig(i);.w-known, 
and ig(i);.w-unack. The update of world; is identical to that in L-RAMBO. By definition ig(),;.w-known is 
the set of node identifiers that 7 is assured that 7 has, hence we update it with information in W. Similarly, 
by definition i9(2);.w-unack is the set of node identifiers that j is waiting for 7 to acknowledge. It is possible 
that 7 has learned some or all of this information from other nodes and it is now a part of W, hence we 
remove any identifiers in W that are also in ig();.w-unack from ig(i);.w-unack; these identifiers do not need 
further acknowledgment. 

What happens next in the effect of recv depends on the value of pnr (the phase number that 7 believes 
j to be in). First, if pnr < ig(t);.p-ack, this means that this message is a stale message since there must 
have been a prior message from j to 7 that included phase number of 7 higher or equal to pnr. Hence, 
no updates take place. Second, if pnr > ig(t);.p-ack, this message is considered to be an acknowledgment 
message. By definition ig(i);.p-ack contains the phase number of 7 when last acknowledgment from i was 
received. Following last acknowledgment, phase number of 7 was incremented, ig(7);.p-ack was assigned the 
new value of phase number of 7, and lastly new set of identifiers to be propagated was recorded. Since node 
i replied to j with phase number larger than ig(7);.p-ack it means that j and 7 exchanged messages where 7 
learned about the new phase number of j, by the same token 7 also learned the information included in these 
messages. (We show formally that ig(i);.w-unack is always a subset of each message component W that is 
sent to i by j.) Hence, it is safe for 7 to assume that i at least received the information in ig(7);.w-unack 
and to add it to ig(t);.w-known. 

Since the choice of 2 and 7 is arbitrary, gossip from j to 7 is defined identically. 


Atomicity of LL-RAMBO. We show that any trace of LL-RAMBO is a trace of L-RAMBO, and thus a trace 
of RAMBO. We start by defining the remaining history variables used in the proofs. These variables are 
annotated in Fig. 3 with a [h] symbol. 


e For every tuple (m,i,j) € h-msg;, where m = (W,D,v,t,cm,pns,pnr) and pns = p, the history 
variable hsent-W(i,j,p) is a mapping from I x I x N to 2‘ U{L}. This variable records the world 
component of the message, W, when 7 sends message m to j, and 2’s phase number is p. Similarly, we 
define a derived history variable hsent-D(i,j,p), a mapping from I x I x N to 27 U{L}. This history 
variable records the departed component of the message, D, when 7 sends message m to j, and 7’s phase 
number is p. 


Now we list history variables used to record information for each send((W, D, v,t, cm, pns, pnr));,; event. 


e Each of the following variables is a mappings from J x I x N to 27 U{L}. hs-world(i,j, pns) records the 
value of world;, hs-departed(i,j,pns) records the value of departed;, hs-wknow(i,j,pns) records the 
value of ig(j);.w-known, hs-dknow(i,7, pns) records the value of ig(j);.d-known, hs-wunack (i,j, pns) 
records the value of ig(j);.w-unack, and hs-dunack(i,7, pns) records the value of ig(j);.d-unack. 


e hs-pack(i,7,pns) is a mapping from Ix J xN to N. It records the value of ig(j);.p-ack. 
The last history variables record information in messages at each recv((W, D,v,t,cm, pns, pnr));,, event. 

e Each of the following is a mapping from IxI xN to 2'U{L}. hrecv-W(j, i, pns) records the component 

W (world) and hrecv-D(j, i, pns) records the component D (departed). 

Similarly as in Section 3, we define history variable h-MSG that keeps track of messages sent by Reader- 
Writer automata. 
Lemma 4.1: In any execution of LL-RAMBO, if m is a message received by node 7 in a recv(m);,; event, 
then (m, j,i) € h-MSG, and m € {(W, D,v,t, cm, pns, pnr), leave, join}, where (W, D,v,t,cm,pns, pnr) € 
IxIxVxTxCmapxN xN. 


Proof. This proof is identical to that of Lemma 3.1, since the format of messages sent by Reader- Writer 
automata in LL-RAMBO is as in L-RAMBO. 


We continue by showing properties of messages delivered by Reader- Writer processes. 


Lemma 4.2: Consider a step (s,7,s’) of an execution a of LL-RAMBO, where at = 
recv((W, D,v,t,cm,p;,pi))j,, for 1,7 € I, and pi > s.ig(j);-p-ack. Then, (a) s.ig(j);.p-ack = 
s.hs-pack(i,j,pi), (b) s.ig(7);.w-unack C — s.hs-wunack(i,j,p;), and (c)  s.ig(j);.d-uwnack C 
s.hs-dunack(i,7, pi). 


Proof. We prove the three parts separately: 


Part (a). Assume for contradiction that p; > s.ig(j);.p-ack and s.hs-pack(i,j,p:) # s.ig(j);-p-ack. By the 
code of the algorithm and the monotonicity of pnum1 the only possibility is such that s.hs-pack(i,j,pi) < 
s.ig(j)i-p-ack. This suggests that there must be a receive event recv((W,D,v,t,cm,p;,p))j,4, where 
p > s.hs-pack(i,j,p;i) (hence hrecv-W(j,i,p) and hrecu-D(j,i,p) are defined) that resulted in the value 
of s.ig(j);-p-ack. By the code of the recv;,; action, for i,j € I, ig(j):.p-ack is assigned the phase number of i 
that 7 has during the receive event. Hence, by the phase number paradigm, we have that s.ig(j);.p-ack > pj, 
which contradicts our initial assumption. 


Part (b). From part (a) we have that hs-pack(i,j,p;) = s.ig(j);.p-ack. From the code it follows that if 
ig(j):-p-ack does not change then the membership of the set ig(j);.w-wnack can only be reduced (the “if 
pnr > ig(j)i-p-ack then” statement in not executed). Therefore, s.ig(j);.w-unack C s.hs-wunack(i,7, p;). 

Part (c). Similar to part (b). From part (a) we have that hs-pack(i,j,pi) = s.ig(j);.p-ack. From the 
code it follows that if ig(j);.p-ack does not change then the membership of the set ig(j);.d-wnack can only 
be reduced (the “if pnr > ig(j);.p-ack then” statement in not executed). Therefore, s.ig(j);.d-unack C 
s.hs-dunack(t,7, pi). 


We now state and prove four invariants that lead to the proof of atomicity of LL-RAMBo. The first 
invariant states that a node does not send information to another node that the first node does not already 
possess. 


Invariant 1: For all states s of any execution a of LL-RAMBO: 
((W, D,v,t,cm, pns, pnr),i, 7) € s.h-MSG => W C s.world; \ D C s.departed,. 


Proof. The proof is done by induction on the length of the execution. The base case is trivial because all 
sets are empty in the initial state. Assume the invariant holds for state s and consider step (s,7, s’). 


1. m = recv((W, D,v,t, cm, pns, pnr));;. By Lemma 4.1, ((W, D,v,t,cm,pns, pnr),j,i) € h-MSG. By 
inductive hypothesis and the monotonicity of the sets world; and departed ,, in state s’, the invariant 
is maintained. 


2. m = recv(join); ;. By Lemma 4.1, (join, j,i) € h-MSG. By the effects of this action, s.world; C s'.world; 
and s.departed,; = s'.departed;. By inductive hypothesis and the monotonicity of the sets world; and 
departed,;, in state s’, the invariant is maintained. 


3. 7 = recv(leave); ;. By Lemma 4.1, (leave, j,i) € h-MSG. By the effects of this action, s.world; = 
s’.world; and s.departed; C s'.departed;. By inductive hypothesis and the monotonicity of the sets 
world; and departed,, in state s’, the invariant is maintained. 


4.7 = send(m), ;. By the code of this action m = (W,D,v,t,cm,pns,pnr), such that W = 
s.world; — s.ig(j);.w-known and D = s.departed, — s.ig(j);.d-known. Therefore, W C s.world; and 
D C s.departed,. Observe that the effects of this action do not change world; and departed;. Hence 
W C s’.world; and D C s'’.departed;. By inductive hypothesis, the assignment (m,i,7) € h-MSG 
maintains the invariant. 


5. Other actions do not change the variables involved in the invariant. So, by inductive hypothesis, in 
state s’, the invariant continues to hold. 


This completes the proof. 


The following invariant states that the information that 7 expects 7 to acknowledge does not exceed the 
information that actually should be acknowledged. 


Invariant 2: For all states s of any execution a of LL-RAMBO: 
(a) Vi,j EL: s.ig(7);.w-unack C s.world; — s.ig(j);-w-known. 
(b) Vi,j € I: s.ig(g);.d-unack C s.world,; — s.ig(j);.d-known 


Proof. We prove part (a) of the invariant. The proof of part (b) is analogous: the arguments are made on 
departed related variables instead on world related variables. 

The proof is done by induction on the length of the execution. The base case is trivial because all sets 
are empty in the initial state. Assume the invariant (part (a)) holds for state s and consider step (s, 7, s’). 


1. 7 = recv((W, D, v,t,cm, pns, pnr));;. By Lemma 4.1, ((W, D,v, t,cm, pns, pnr), j,i) € s.h-MSG. We 
consider 4 cases: 


(i) 


(ii) 


(iii) 


z€WAz¢s.world; \z #4 Jj. This is the first time node 7 learns about node z, indirectly from 
j. By initial value of the ig(z); record and the monotonicity of world; the invariant (part (a)) is 
maintained. 


j € s.world;. This is the first time i learns about j, directly from j. Similarly to case 1(i), 
ig(j);.w-unack and ig(j);.w-known are initialized to empty. Also ig(j);.p-ack is set to zero. 
Before the first inner “if-statement”, s’.world; = s.world; UW (note that 7 is now in s’.world;). 
Also, s’.ig(j);-w-known = ig(j);.w-knownUW and s’.ig(7);.w-unack = ig(j);.w-unack —W. Since 
j ¢ s.world;, by the code of the send action i never send messages to 7, and hence pr is zero. 
Therefore, the “if pnr > ig(j);.p-ack then” statement is not executed. By inductive hypothesis 
(part (a)), the invariant (part (a)) is reestablished. 


j € world; \pnr < s.ig(j);.p-ack. Since pnr < s.ig(j);.p-ack, this implies that node 7 has learned 
and communicated with node j in some earlier step of the execution. By the effects of this action, 
s’ world; = s.world; UW, s’.ig(j);.w-known = s.ig(j);.w-known U W, and s’.ig(j);.w-unack = 
s.ig(j);-w-unack — W. Then, 


s’.world; — s’.ig(j);.w-known = (s.world; UW) — (s.ig(j);.w-known U W) 


= s.world; — s.ig(j);.w-known — W. 
By inductive hypothesis (part (a)), 
s.ig(7);.w-unack — W C s.world; — s.ig(j);.w-known — W. 
Therefore, 
s’.ig(j);-w-unack C s'.world; — s'.ig(j);.w-known. 
) 


Thus, the invariant (part (a)), in state s’, is maintained. 


j € world; \ pnr > s.ig(j);.p-ack. Using similar arguments as in case I(iii) we have that 
up to the “if pnr > ig(j);.p-ack then” statement the invariant (part (a)) holds. Since 
pnr > s.ig(j);-.p-ack the “if-statement” is executed, and by its effects we have s’.ig(j);.w-unack = 
s’.world; — s'.ig(j);.w-known. Hence, the invariant (part (a)) is reestablished. 


2. m = recv(join), ,. By Lemma 4.1, (join, 7,7) € s.h-MSG. By the code of this action s.world; C s’.world,. 
J yt iy: 


? 


If this is the first time node 7 learns about node j then by the initial assignment of the ig(j); record 
and the monotonicity of world;, in state s’, the invariant (part (a)) holds. On the other hand, if 
j € s.world; then by the inductive hypothesis (part (a)) and the monotonicity of world;, in state s’, 
the invariant (part (a)) holds. 


3. Other actions do not change the variables involved in the invariant (part (a)). So, by inductive 
hypothesis (part (a)) , in state s’, the invariant (part (a)) continues to hold. 


This completes the proof of part (a). As mentioned at the beginning of the proof, part (b) is shown in a 
similar manner. 


In the next invariant we use history variables to show that if 7 sends a message to 7 when it believes that 
j’s phase number is p then the information that i needs 7 to acknowledge does not exceed the information 
included in that message. 


Invariant 3: For all states s of any execution a of LL-RAMBO: 
(a) ((W, D, v, t, cm, p, pnr),i,7) € s.h-MSG => s.hs-wunack(i,j, p) C W 
(b) ((W, D,v,t, em, p, pnr),i, 7) € s.A-MSG => s.hs-dunack(i,j,p) C D 


Proof. The proof is done by induction on the length of the execution. The base case is trivial because all 
sets are empty in the initial state. Assume the invariant holds for state s and consider step (s,7, s’). 


l. mt =  send((W,D,v,t,cm,pns,pnr));;. By the effects of this action we have that W = 
s’.world; — s’.ig(j);-w-known and _ s’.hs-wunack(i,j,p) = s’.ig(j);.w-unack is defined. Also, D = 
s'. departed, — s’.ig(j);.d-known and s’.hs-dunack(i,j,p) = s’.ig(j);.d-unack is defined. By In- 
variant 2 we have that s’.ig(j);.w-unack C s’.world; — s’.ig(j);.w-known, and s’.ig(j);.d-unack C 
s'.departed, — s'.ig(j);.d-known. Thus, by substitution we get that s’.hs-wunack(i,j,p) GC W and 
s’.hs-dunack(t,7,p) CG D, as desired. Therefore, by inductive hypothesis, in state s’, the invariant is 
reestablished. 


2. Other actions do not change the variables involved in the invariant. So, by inductive hypothesis, in 
state s’, the invariant continues to hold. 


This completes the proof. 


We use Lemmas 4.1 and 4.2, and Invariants 1, 2, and 3 to show the key Invariant 4 for the atomicity of 
LL-RAMBO. Here we show that node 7 never overestimates the knowledge possessed by j. 


Invariant 4: For all states s of any execution a of LL-RAMBO: 
(a) Vi,g € 1: s.ig(j);.w-known C s.world,;, 
(b) V i,j € I: s.ig(j)i-d-known C s.departed ;. 


Proof. We prove part (a) of the invariant. The proof of part (b) is analogous: the arguments are made on 
departed related variables instead on world related variables. 

The proof is done by induction on the length of the execution. The base case is trivial because all sets 
are empty in the initial state. Assume the invariant (part (a)) holds for state s and consider step (s,7, s’) 
(following discussion involves only part(a) of invariant). 


1. m= recv((W, D,v,t, cm, pns, pnr)).,;. By Lemma 4.1, ((W, D, v, t, cm, pns, pnr), *,j) € s.h-MSG. The 
set s.ig(j);.w-known, for i € I, i # Jj, is not updated by the effects of this action. Observe that 
s.world; C s’.world; (for i = j the invariant is trivially maintained). By inductive hypothesis and the 
monotonicity of world;, in state s’, the invariant is maintained. 


2. m=recv(join), ;. By Lemma 4.1 (join, *, 7) € s.h-MSG. The discussion in this case is identical to that 
of case 1. 


3. 7 = recv((W, D, v,t, cm, pns, pnr));,;. By Lemma 4.1, ((W,D,v,t,cm, pns, pnr), j,i) € s.hk-MSG. We 
consider 4 cases: 


(i) z€WAz¢s.world; \ z 4 j. This is the first time node t learns about node z, indirectly from J. 
By initial value of the ig(z); record and the monotonicity of world; the invariant is maintained. 
(This is for all nodes other than 7. We consider j in the remaining three subcases.) 


(ii) 7 € s.world;. This is the first time i learns about j, directly from j. Similarly to case 
3(i), ig(j);-w-unack and ig(j);.w-known are initialized to empty. Also ig(j);.p-ack is set to 
zero. Before the “if-statement”, s’.world; = s.world; UW (note that j is now in s’.world;). 
Also, s’.ig(j);.w-known = ig(j);.w-known U W (ig(j);.w-known = W) and s’.ig(j);.w-unack = 


ig(j);-w-unack — W (ig(j);.w-unack remains empty). Since j ¢ s.world;, by the code of the send 
action i never sent messages to j, and hence pnr is zero. Therefore, the “if pnr > ig(j);.p-ack 
then” statement is not executed. By substitution s’.ig(j);.w-known = W. From Invariant 1 
we have that W C s.world; = s’.world;. Hence s’.ig(j);.w-known C s’.world;, as desired. By 
inductive hypothesis, in s’, the invariant is maintained. 


(iii) 9 € s.world; A pnr < s.ig(j);.p-ack. Since pnr < s.ig(j);.p-ack, this implies that node i has 
learned and communicated with node j in some earlier step of the execution. By the effects 
of this action s’.ig(j);.w-known = s.ig(j);.w-known UW. From Invariant 1 we have that W C 
s.world; = s'.world;. Also by inductive hypothesis s.ig(j);.w-known C s'.world;. Therefore, 
s'.ig(j);-w-known C s’.world;, as desired. Thus, in s’, the invariant is maintained. 


(iv) 7 € s.world; A pnr > s.ig(j);-p-ack. Using similar arguments as in case 3(iii) we have 
that up to the “if pnr > ig(j);-p-ack then” ig(j);.w-known = s.ig(j);.w-known UW and 
ig(j)i-w-unack = s.ig(j);.w-unack — W. Since pnr > s.ig(j);-p-ack the “if-statement” is 
executed; by its effects we get s’.ig(j);.w-known = ig(j)i-w-known U ig(j);-w-unack. Re 
call that ig(j);.w-known = s.ig(j);.w-known UW. By Invariant 1 and inductive hypothesis, 
ig(j):-w-known C s’.world;. 

Since pnr is contained in the message from j by 7, we have that s.hrecu-W(i,j,pnr) must 
be defined; moreover, by the code of the algorithm, the monotonicity of the world vari- 
able, and the definition of hrecu-W(i,j,pnr), we have that s.hrecu-W(i,j,pnr) C s'.world;. 
By the properties of the Channel automata we have that s.hsent-W(i,j,pnr) must also 
be defined and its value equals the value of s.hrecu-W(i,j,pnr). By Invariant 3 we 
have that s.hs-wunack(i,j,pnr) C s.hsent-W(i,j,pnr). By Lemma 4.2(b) we have that if 
pnr > s.ig(j)i-p-ack then s.ig(j);.w-unack C s.hs-wunack(i,j,pnr). By substitution we have 
s.ig(j)i-w-unack C s'.world;. 

Therefore, s’.ig(j);-w-known C s’.world;, as desired. Thus, by inductive hypothesis, in s’, the 
invariant is reestablished. 


4. Other actions do not change the variables involved in the invariant. So, by inductive hypothesis, in 
state s’, the invariant continues to hold. 


This completes the proof of part (a). As mentioned at the beginning of the proof, part (b) is shown in a 
similar manner. 


Finally we show the atomicity of objects implemented by LL-RAMBO by proving that it simulates L- 
RAMBO, i.e., by showing that every trace of LL-RAMBO is a trace of L-RAMBO (hence of RAMBO). 


Theorem 4.3: LL-RAMBO implements atomic read/write objects. 


Proof. We show that LL-RAMBO simulates L-RAMBO. Specifically, we show that there exists a simulation 
relation R from LL-RAMBO automaton to L-RAMBO automaton that satisfies Definition 3.2. Observe that 
LL-RAMBO and L-RAMBO have the same external signatures. 

We denote by MSG;,, the set of messages in the channel automata of L-RAMBO and by MSG,2 the 
set of messages in the channel automata of LL-RAMBO. 

We define the simulation relation R to map: 


(a) a state t of LL-RAMBO to a state s of L-RAMBO so that every “common” state variable has the 
same value. For example, for node i € I, t.world; = s.world;, t.departed; = s.departed;, t.pnum1,; = 
s.pnuml;, t.cmap; = s.cmap,, etc. 


(b) a message m = (W,D,v,t,cm,pns,pnr) € Channel;.MSGiry, to a message m’ = 
(W, D,v,t,em,pns,pnr) € Channel;;.MSGi, so that: (i) mv = m/v, (ii) mt = m’t, (iii) 
m.cm = m'.cm, (iv) m.pns = m’.pns, and m'.W = hs-world(i,j,pns), and lastly (vii) m.D = 


hs-departed(i,7, pns) — hs-dknow(i,j, pns) and m’.D = hs-departed(i,j,pns). (We assume that the 
history variables hs-world(i,j,pns) and hs-departed(i,7,pns) are used in L-RAMBO in the similar 
manner that they are used in LL-RAMBO.) 


Recall that the difference between the two algorithms is in the Reader-Writer automata. Therefore, in 
order to show that LL-RAMBO simulates L-RAMBO, we focus only on transitions related to the Reader- 
Writer automata. We now show that R satisfies Definition 3.2: 


1. If t is an initial state of LL-RAMBO then there exists an initial state s of L-RAMBO such that s € 
R(#), since all common state variables have the same initial values. For example, Vi € I, t.world; = 
s.world; = 0, t.departed; = s.departed; = 0, t.pnum1; = s.pnum1; = 0, etc. Also, the channels do not 
contain any messages. 


2. Suppose ¢ and s are reachable states of LL-RAMBO and L-RAMBO respectively such that s € R(t) and 
that (t,7,t’). We show that there exists a state s’ € R(t’) such that there is an execution fragment of 
L-RAMBO that has the same trace as 7. 


(a) 


If m = recv(m);;, i,j € I, where m = (W,D,v,t,cm,p,pnr). By Lemma 4.1, m € s.h-MSG. 
Let s’ be such that (s,recv(m’),;, 8’). Both actions have empty trace, since recv(m);,; (resp. 
recv(m’),,;) is considered internal with respect to the composition of automata that comprises 
automaton LL-RAMBO (resp. L-RAMBO). 

Now, since m = (W,D,v,t,cm,p,pnr) was in Channel;,;.MSGiiz, by the message correspon- 
dence of R, message m’ = (W, D,v,t,cm, p, pnr) is in Channel; ;.M SG, and m and m’ have the 
mapping defined above. By inspection of the code of both algorithms, it follows that besides state 
variables world; and departed;, all other common state variables are updated identically as the 
information obtained by messages m and m’ is the same for these variables. Also, the message 
correspondence is not affected, since no messages are sent. 

Hence we focus on showing that t’.world; = s’.world; and t'.departed; = s'.departed;. From the 
code of LL-RAMBO, we have that t’.world; = t.world; Um.W and t’.departed; = t.departed; U 
m.D. Since m was received (and hence removed from the channel), by the properties of the 
channel, there was a send event that placed m in the channel so that m.W = t.hsent-W (j,i, p) = 
t.hs-world(j,i,p) — t.hs-wknow(j,i,p) and m.D = t.hsent-D(j,i,p) = t.hs-departed(j,1,p) — 
t.hs-dknow(j,1,p). From Invariant 4 and the monotonicity of variables world and departed we 
have that hs-wknow(j,i,p) C t.world; and hs-dknow(j,i,p) C t.departed;. From this and the 
fact that t.hs-wknow(j,i,p) C t.hs-world(j,i,p) and t.hs-dknow(j,i,p) C t.hs-departed(j, i, p), 
we conclude that t/.world; = t.world U t.hs-world(j,i,p) and t'.departed; = t.departed U 
t.hs-departed(j,1,p). But by the state and message correspondence of R we have that t.world = 
s.world, t.departed = s.departed, m'.W = t.hs-world(j,i,p), and m’.D = t.hs-departed(j, 1, p). 
Hence, t’.world; = s.worldUm'.W = s'.world and t’.departed; = s.departedUm'.D = s'.departed, 
as desired. 


If t = send(m);;, 2,9 € JI, where m = (W,D,v,t,cm,p,pnr). Let s’ be such that 
(s,send(m’);,;,s’). Both actions have empty trace, since send(m),;,; (resp. send(m’);,;) is con- 
sidered internal in the composition of automata that comprises automaton LL-RAMBO (resp. 
L-RAMBO). 

From the code of LL-RAMBO we have that m = (W,D,v,t,cm,pns,pnr) is placed 
in Channel;,;.MSGiiyp where mW = _ hs-world(i,j,pns) — hs-wknow(i,j,pns), m.D = 
hs-departed (i, 7, pns) — hs-dknow(t,j, pns), m.v = t.value;, m.cm = t.cmap;, m.pns = t.pnum1;, 
and m.pnr = t.pnum2(j);. From the state correspondence, since send(m);,; is enabled, 
send(m’);,; is also enabled. Furthermore, send(m’);,; places m’ = (W,D,v,t,cm,pns,pnr) in 
Channel; ;,.MSGi_ where m’.W = hs-world(i,j,pns), m’.D = hs-departed(i,j, pns), m'.v = 
s.value;, m'.cm = s.cmap;, m’.pns = s.pnuml;, and m/.pnr = s.pnum2(j);. From the 
state correspondence of R for ¢ and s we conclude that the message correspondence of R 
for t' and s’ is preserved. Also, the state correspondence for ¢’ and s’ is preserved, since 
t.pnuml; = t.pnum1; +1 = s.pnuml1; +1 = s’.pnum1,; and all other common variables remain 
unchanged. 


If m is an action besides send(m),;,; or recv(m);,;, then we choose s’ such that (s,7,s’). That 
is, we simulate the same action. By inspection of the code of the algorithms it follows that any 


state change after 7 occurred is identical for both algorithms, hence the state correspondence and 
message correspondence given by R is preserved. 


Therefore, R is a simulation mapping from LL-RAMBO to L-RAMBO per Definition 3.2. By Theorem 3.3, 
any trace of LL-RAMBO is a trace of L-RAMBO. Since L-RAMBO implements atomic read/write objects per 
Theorem 3.4, so does LL-RAMBO. 


5 LL-RAMBO Performance 


In this section we analyze the performance of LL-RAMBO. We perform conditional analysis of read and 
write operation latency — we assess improvement in communication for specific scenarios. 


5.1 Conditional read/write operation latency analysis. 


A conditional analysis of RAMBO read and write operation latency is presented in [2]. Here we show that 
under the same conditions LL-RAMBO has the same operation latency. We start by giving relevant definitions 
(based on [2] and [32]). Let d denote the maximum message delivery latency. Let d also be the interval at 
which the gossip messages are sent. An execution with times associated with all events is called a timed 
execution. A timed execution is said to be admissible if the following condition holds: “If timed execution € 
is an infinite sequence, then the times of the actions approach oo. If € is a finite sequence, then in the final 
state of €, every enabled task must be allowed to complete” (for more details see [32]). 

Let a be an admissible timed execution, and let a’ be a finite prefix of a. Let ftime(a’) denote the time 
of the last event in a’. We say a is an a‘-normal execution if (7) after a’, the local clocks of all automata 
progress at exactly the rate of real time, (i7) no message sent in a after a’ is lost, and (ii) if a message is 
sent at time ¢ in a and it is delivered, then it is delivered by the time max{t + d, (time(a’) + d}. 

LL-RAMBO allows sending of gossip messages at arbitrary times. For the purpose of latency analysis, we 
restrict the sending pattern: we assume that each automaton sends messages at the first possible time and 
at regular intervals of d thereafter, as measured on the local clock. Also, non-send locally controlled events 
occur just once, within time 0 on the local clock. 

As with all quorum-based algorithms, operation liveness depends on all the nodes in some quorums 
remaining alive or not departing. We say that a configuration is installed when every member of the 
configuration has been notified about the configuration. We say that an execution a is (a’,e,7)-configuration- 
viable if for every installed configuration, there exists a read-quorum, R, and a write-quorum, W, such that 
no process in RUW fails or departs before the maximum of (2) time 7 after the next configuration is installed, 
and (it) £time(a’) +e+rT. 

We say that execution a satisfies (a’,7)-recon-spacing if after a’, at least time 7 elapses between the event 
that reports a new configuration c (report(c);) and any following event that proposes a new configuration 
(recon(c, *);). In other words, after a’, when the system stabilizes, reconfigurations are not too frequent. 

Execution a is said to satisfy (a’,e)-join-connectivity if after a’, for any two nodes that both joined the 
system at time t — e, they know about each other by time t. 

Execution a satisfies (a’,e + T)-recon-readiness if after a’, every recon(c) event proposing a new config- 
uration includes a node ¢ in c only if 7 joined at least time e + 7 ago. This, in conjunction with (a’, e)-join- 
connectivity, ensure that all the nodes in active configurations are aware of each other. 

As in [2], we assume that @ is an a’-normal execution, satisfying (a’,e,23d)-configuration-viability, 
(a’, 8d)-recon-spacing, (a’,e)-join-connectivity, and (a’,e + d)-recon-readiness. (See [2] for a further dis- 
cussion of these assumptions.) With this we show conditionally that read and write operations take no more 
than 8d time. 


Theorem 5.1: Let a be an a’-normal execution of LL-RAMBO satisfying join-connectivity, recon-readiness, 
recon-spacing, and configuration-viability. Let t > @time(a’) +e+d. Assume 7 is a node that received a 
join-ack,; prior to time t — e — d, and neither fails nor departs in a until after time t + 8d. Then if a read or 
write operations starts at node 7 time t, it completes by time ¢ + 8d. 


The proof is essentially identical to the proof of Theorem 5.3 in [2]. The key observation is that under 
the assumed conditions the incremental gossip does not affect the pattern of messages. The only difference 
is that these messages do not contain information that had been already propagated. 


5.2 Communication Efficiency Analysis 


In this section we illustrate the communication savings attained by LL-RAMBO as compared to RAMBO. 
The savings are assessed both in terms of gossip message size and number of gossip messages. We present 
communication analysis of a’-normal timed executions a, where the prefix a’ satisfies specific properties. 
Following the prefix a’, we assume that a is divided into rounds of length d, where d is greater than the 
maximum message latency. The active nodes send gossip messages at the beginning of each round and 
subsequently receive gossip messages sent during the current round. We assume that in all executions node 
failure and departures do not disable the quorum systems used by the algorithms. 

We observe that given any timed execution a of LL-RAMBO, it is possible to construct an execution 
& of RAMBO that has the same interaction with its environment as a (as follows from Theorem 4.3), but 
that includes additional gossip messages and gossip messages with different content, and that excludes leave 
notification messages that are specific to LL-RAMBO. We also denote by @’ the prefix of & that corresponds 
to a’. In the rest of this section we will be using this notation in comparing the communication efficiency of 
the two algorithms in terms of gossip messaging. 

Recall that gossip messages in RAMBO have the format (W,v,t,cm,pns,pnr), and in LL-RAMBO the 
format is (W, D,v,t,cm, pns, pnr). We also assume that each node identifiers is y bits long, and that the 
values v, t, cm, pns, and pnr, altogether occupy 6 bits (a constant) in gossip messages. 


Scenario 1: No joins or departures after a’. Here we make the following additional assumptions about 


a’ for LL-RAMBO and @ for RAMBO. Consider the following sequence of events, identical for a’ and a’. 
The service is initialized by some node, called the creator. Next m new participants join the service by 
sending to the creator m join requests. After the last join request is received, the creator sends a gossip 
message to each new participant. Once these gossip messages are received all nodes have active status. At 
this point the cardinality of world for each node is n = m+ 1. Now, | nodes decide to leave the system. In 
a’ of LL-RAMBO these nodes send leave notification messages to all participating nodes. We assume that 
these messages are delivered. These notification messages are of constant size and are much smaller than 
any gossip message. Now the cardinality of the departed set at each node is | in LL-RAMBO. In the case 
of RAMBO, I nodes leave by emulating crash failures. The number of the active nodes, a, is a = n —[. 
This concludes a’ (respectively @), following which no new nodes join the system or no active nodes leave 
the system. Following a’ (respectively @’), normal timing holds in a (respectively @), and the active nodes 
gossip at regular intervals as described above. We now give the result that compares the two algorithms for 
r rounds of gossip. 


Proposition 5.2: Let a be a’-normal execution of LL-RAMBO and @ be @’/-normal execution of RAMBO as 
defined by Scenario 1. Then: 


(a) there are r-a-l fewer gossip messages in a following a’ than in @ following 4’; 


(b) the bit complexity of gossip messages in a following a’ is smaller by ((r—1)-n?-a-y)+(a-l?-y)+(r-a-l-6) 
than the bit complexity in @ following a’. 


Proof. Nodes participating in RAMBO can not distinguish nodes that failed from nodes that departed. 
Therefore, in each round of gossip RAMBO sends a-n messages (an active node sends gossip messages to 
all nodes in its world). Each gossip message has the size of n-y + 6 bits (|W| = |world| = n). Hence, 
the gossip message complexity of RAMBO for r rounds is r-a-n, and the gossip message bit complexity is 
r-a:n:(n-y+9). 

In LL-RAMBO each node has learned that | nodes departed. Therefore, a? gossip messages are sent in 
each of the r rounds. In the first round, each gossip message has size (n+1)-7+6 (since nodes do not know 
what other nodes know). Since all messages are delivered, at the end of the first round each node knows 
that every active node has full knowledge of world and departed sets. Hence, in the following rounds the 


gossip message size is 6 bits. Therefore, the gossip message complexity of LL-RAMBO for r rounds is r - a?, 


and the gossip message bit complexity is (a? -(n +1)-y) + (r-a?-6). 
Therefore, LL-RAMBO sends r-a-1 fewer gossip messages than RAMBO. The reduction for the gossip 
message bit complexity is ((r—1)+n?-a-y)+(a-P?-7)+(r-a-l-6). 


If J is zero then the behavior of these two algorithms is identical. Otherwise, | can be as small as one and 
as large as n — 1, and since a = n — l, the savings in gossip messages for LL-RAMBO are between Q(r - n) 
and O(r +n”). The reduction in bit complexity for LL-RAMBO is between Q(r-n?) and O(r -n3) (y and 6 
are assumed to be constants). 


Scenario 2: Steady turnover rate after a’. In this scenario, following the initial segment of an execution, 
nodes join and leave at a constant rate per round in both RAMBO and LL-RAMBO. Consider the following 
sequence of events for both systems. As in Scenario 1, the service is initialized by the creator. Then m new 
participants join the service by sending join requests to the creator. After the last join request is received, 
the creator sends gossip to all participants. Once these gossip messages are received all nodes are active. At 
this point the cardinality of world at each node is n = m+ 1. This concludes a’ of RAMBO (respectively 
a of LL-RAMBO). Following a’ (respectively @’), normal timing holds in a (respectively @), and the active 
nodes gossip at the beginning of each round. In this scenario nodes join and leave the service at the same 
rate: during each round following a’ (respectively @’), z new nodes join and z active nodes leave the service. 
We now give the result that compares the two algorithms for r rounds of gossip. 


Proposition 5.3: Let a be a’-normal execution of LL-RAMBO and @ be @’/-normal execution of RAMBO as 
defined by Scenario 2. Then: 


(a) there are (n—z)-z-((r —1)-(r — 2)/2) fewer gossip messages in a following a’ than in @ following @’; 


(b) the bit complexity of gossip messages in a following a’ for r > 2 is smaller by 
(n= 2): (7-2: (r= 1)- (7-2): (2-7 )/6) + (2-7-8) 2+ ((P=1) (7-2) /2) 42-7: (N=2)- (r=) +n?-7) 


than the bit complexity in @ following a’. For r = 2 the reduction is (n — z)-n?-y and for r = 1 the 
bit complexity is the same. 


Proof. For the analysis we assume that the join requests are received by all active nodes by the end of the 
round in which they were sent, and are receive after all gossip messages are sent. On the other hand, we 
assume that the leave notifications are received (in LL-RAMBO) by active nodes at the end of the following 
round. These assumptions yield worst case behavior for LL-RAMBO in Scenario 2 since delaying notification 
messages only affect the gossip communication complexity of LL-RAMBO (in RAMBO no notification messages 
are sent). 

We now analyze the behavior of RAMBO for a round 1 < k <r. By the code of RAMBO, we observe that 
the world set grows as nodes join the system. Since gossip messages include the entire world set it follows 
that the size of gossip messages is proportional to the size of world. In RAMBO nodes leave by emulating 
failures. Since nodes can not detect such failures, each active node attempts to gossip to all members in its 
world, regardless of their failure status. Hence, as new participants join, the number and the size of gossip 
messages sent in RAMBO increases. 

We separately consider the case when k = 1. At the beginning of the first round the world of each active 
node is n (join requests have not been yet received). From this and the fact that only active nodes gossip 
(new joined nodes become active after they receive a gossip message), in first round n? messages are sent 
each of size n-y +6. We now consider 2<k <r. At the beginning of the k’th round there are n — z active 
nodes. Observe that up to k’th round (&—1)-z active nodes left the system, also the same number attempted 
to join the system, of which only (k — 2)-z became active by k’th round (it takes at most 2d time for a node 
to become active). Hence, there may be at most up to n — z active nodes and n participating nodes in the 
system at the beginning of each round k. The world set of each active node has cardinality n + (k —1)- z 
(since z nodes enter the system in every round). Therefore, in each round k > 1, (n— z)- (n+ (k-—1)-2z) 
gossip messages are sent, each of size (n+ (k—1)-z)-y+6. 


We now analyze, in a similar fashion, the behavior of LL-RAMBO for a round 1 < k <r. By the code 
of LL-RAMBO we observe that both the world and the departed sets grow as the new participants join and 
active nodes leave the system. However, two nodes that have been active in two consecutive rounds, need 
only to exchange gossip messages that contain only the identifiers of the z nodes that have departed and 
the z nodes that have joined the system in the previous round. On the other hand, an active node has to 
send all current knowledge about world and departed to the newly joined nodes. Also note that once a node 
learns that another node departed from the system it does not send any further gossip messages to that 
node. We separately consider cases where / = 1 and k = 2. The argument for the first round is identical 
to the argument of the first round of RAMBO. At the beginning of the second round there are n — z active 
nodes (this is because in the first round z nodes departed and the z nodes joined the system, where the 
newly joined nodes are not active yet). Also the cardinality of world variable of each active node is n+ z and 
the cardinality of departed variable is zero (since the leave notifications have not been received yet — will be 
received at the end of this round). Therefore, in the second round, (n — z)-(n+ z) gossip messages are sent, 
and the total bit complexity of these messages is (n— z)- (n- (z-y+d)4+2-((n+2)-y+ 5) . We now consider 
2<k<r. At the beginning of the round k there are n — z active nodes (using the same reasoning as in case 
of RAMBO for rounds k > 2). The world set of each active node has cardinality n + (k — 1)-z (since z nodes 
enter the system in every round) and the departed set has cardinality (k — 2)-z (since it takes an extra round 
for the notifications to be received). It follows that in round k > 2, (n—z)-(n+2z) gossip messages are sent, 


and the total bit complexity of these messages is (n=2)-(n-(2-2-7 +d)+2:((n+(k—-1)-z+(k 2):2)-7+6)). 


By performing elementary mathematical computations, it follows that for r rounds, LL-RAMBO sends (n — 
z)-z-((r—1)-(r— 2)/2) less gossip messages than RAMBO. Also, it is not difficult to see that in the first 
round LL-RAMBO gossip bit complexity is the same as RAMBO, and the gossip bit complexity reduction for 
the second round is (n— z)-n?-¥. Finally, for r > 2 we have that the savings in the gossip bit complexity are 


(n—2z)- (1-2. ((r—1)-(r—2)-(2-r—1)/6)+(2-y-n+6)-z- ((r=1)-(r=2)/2) +n (n= 2) (r= 2) +n?-7). 


For the following bounds we consider 1 < z < n—1. The savings in gossip messages for LL-RAMBO 
when r > 2 are between Q(r? +n) and O(r? -n?). When 1 < r < 2 there are no savings. The reduction in 
gossip bit complexity for LL-RAMBO when r > 1 is between Q(n?) and O(r? - n°) (y and 6 are assumed to 
be constants). When r = 1 there is no reduction. 


6 LL-RAMBO Implementation 


We developed proof-of-concept implementations of RAMBO and LL-RAMBO on a network-of-workstations. 
In this section we presents preliminary experimental results. 


Experimental Results. We developed the system by manually translating the Input/Output Automata 
specification to Java code. To mitigate the introduction of errors during translation, the implementers 
followed a set of precise rules [3] that guided the derivation of Java code. The platform consists of a Beowulf 
cluster with ten machines running Linux. The machines are various Pentium processors up to 900 MHz 
interconnected via a 100 Mbps Ethernet switch. The implementation of the two algorithms share most of 
the code and all low-level routines, so that any difference in performance is traceable to the distinct world 
and departed set management and the gossiping discipline encapsulated in each algorithm. 

We are interested in long-lived applications and we assume that the number of participants grows ar- 
bitrarily. Given the limited number of physical nodes, we use majority quorums of the these nodes, and 
we simulate a large number of other nodes that join the system by including such node identifiers in the 
world sets. Using non-existent nodes approximates the behavior of a long-lived system with a large set of 
participants. However, when using all-to-all gossip that grows quadratically in the number of participants, it 
is expected that the differences in RAMBO and LL-RAMBO performance will become more substantial when 
using a larger number of physical nodes. 

The experiment is designed as follows. There are ten nodes that do not leave the system. These nodes 
perform concurrent read and write operations using a single configuration (that does not change over time), 
consisting of majorities, i.e., six nodes. Figure 4 compares (a) the average latency of gossip messages and 


(b) the average latency of read and write operations in RAMBO and LL-RAMBO, as the cardinality of world 
sets grows from 10 to 7010. 

LL-RAMBO exhibits substan- 
tially better gossip message la- 
tency than RAMBO (Fig. 4(a)). 
In fact the average gossip latency 
in LL-RAMBO does not vary no- 
ticeably. On the other hand, the 
gossip latency in RAMBO grows Ba sot ee, aa : 
substantially as thé-cardinality of |. © 77s atny mye seeeee rama) || /6 aie Real io ae ay a Gs aC a ae 
the world sets increases. This is 
expected due to the smaller in- 
cremental gossip messages of LL- 
RAMBO, while in RAMBO, the size 
of the gossip messages is always proportional to the cardinality of the world set. LL-RAMBO trades local 
resources (computation and memory) for smaller and fewer gossip messages. We observe that the read/write 
operation latency is slightly lower for RAMBO when the cardinality of the world sets is small (Fig. 4(b)). As 
the size of the world sets grows, the operation latency in LL-RAMBO becomes substantially better than in 
RAMBO. 

We close this section with the following remarks. First, the experiment is designed using a single configu- 
ration that does not change over time. Note that reconfiguration creates additional network traffic, however, 
when reconfigurations are infrequent performance of the implementation is not significantly affected. Since 
the main aim of our algorithm is to reduce the size of the gossip messages we decided not to reconfigure dur- 
ing the experiment. Future experiments will include reconfiguration activities, however it is also important 
to evaluate the system when the reconfiguration traffic does not interfere with the routine gossip. 

Second, the experiments were conducted using our preliminary implementation. This implementation is 
a proof-of-concept for the methodology developed in [3] to translate the IOA specification into Java code. 
Both RAMBO and LL-RAMBO have been implemented faithfully to their specification, however no attempt 
was made to optimize either system. Therefore, the performance improvements in Fig(s). 4(a) and (b) should 
be considered in relative terms. 

Finally, observe that due to node failures some gossip messages may continue to grow without bound (since 
sender does not receive acknowledgments from the failed node). To diminish communication and processing 
impact in this case we use exponential backoff: if a node does not receive a gossip message from some other 
node it will double the delay between consecutive gossip rounds to that node; normal timing is restored 
once communication is reestablished. Note that both algorithms may take advantage of this assumption. In 
our experiments, gossip messages were not sent to the virtual-failed nodes (this approximates exponential 
backoff in the case of permanently crashed nodes). The results in Fig(s). 4(a) and (b) illustrate performance 
of RAMBO and LL-RAMBO when gossip messages are exchanged only between non-failed nodes. 
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Fig. 4: Preliminary empirical results: (a) gossip message latency, (b) read 
and write latency. 


7 Discussion and Future Work 


We presented an algorithm for long-lived atomic data in dynamic networks. Prior solutions for dynamic 
networks [1, 2] did not allow the participants to leave gracefully and relied on gossip that involved sending 
messages whose size grew with time. The new algorithm, called LL-RAMBO improves on prior work by 
supporting graceful departures of participants and implementing incremental gossip. The algorithm sub- 
stantially reduces the size and the number of gossip messages, leading to improved performance of the read 
and write operations. We show that the improved algorithm implements atomic objects in the presence of 
arbitrary asynchrony, dynamic node joins and departures. The algorithm relies on simple point-to-point 
channels that permit message loss and reordering. 

We analyzed the efficiency of the algorithm and illustrated its performance using our implementation of 
the algorithm in the local-area setting. In trading knowledge for communication, the algorithm increases 
the local memory requirements. Specifically, the needed local storage is quadratic in the number of active 
participants. Future plans include exploring optimizations that will reduce the local storage usage to be 


about linear in the number of participants. Also, we plan to further explore ways to decrease the number 
of gossip messages sent. Currently, LL-RAMBO allows all-to-all exchange of information among active 
participants, and when gossip is unconstrained and communication bandwidth is limited, network congestion 
may degrade the performance of the system. One way to solve this problem is to constrain the gossip patterns. 
Restricting gossip vacuously preserves the linearizability property of the algorithm, but will possibly alter 
its performance. 
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APPENDIX 
A RAMBO Specification 


Signature: 


Input: Output: 
join(rambo, J);, J a finite subset of I — {i} send(join);,;, 7 € I — {i} 
join-ack(r);, r € {recon, rw} join(r);, r € {recon, rw} 
fail; join-ack(rambo); 

State: 

status € {idle, joining, active}, initially idle 

child-status € {recon, rw} — {idle, joining, active}, initially everywhere idle 

hints C I, initially 0 

failed, a Boolean, initially false 


Transitions: 
Input join(rambo, J); Output join(r); Output join-ack(rambo); 
Effect: Precondition: Precondition: 

if failed then failed failed 


if status = idle then 
status — joining 
hints — J 


Output send(join);,; 
Precondition: 
failed 
status = joining 
j € hints 
Effect: 
none 


status = joining 
child-status(r) = idle 
Effect: 
child-status(r) — joining 


Input join-ack(r); 
Effect: 
if sfailed then 
if status = joining then 
child-status(r) — active 


status = joining 

Vr € {recon, rw} : child-status(r) = active 
Effect: 

status — active 


Input fail; 
Effect: 
failed — true 


Fig. 5: Joiner;: Signature, state, and transitions 


Signature: 


Input: Output: Internal: 
read; join-ack(rw),; query-fix, 
write(v);, v EV read-ack(v);, vu € V prop-fix, 
new-config(c, k)i, c€ C,k € Nt write-ack; cfg-upgrade(k);, k € N>° 
recv(join);,;, 7 € I — {i} send(m);,;, mE M,jel cfg-upg-query-fix(k);, k € N° 
recv(m); 4, MEM, j ET cfg-upg-prop-fix(k);, k € N>° 
join(rw); cfg-upgrade-ack(k);,k € N>° 
fail; 
State: 
status € {idle, joining, active}, initially idle op, a record with fields: 
world, a finite subset of J, initially 0 type € {read, write} 
value € V, initially vo phase € {idle, query, prop, done}, initially idle 
tag € T, initially (0, io) pnum EN 
cmap € CMap, initially cmap(0) = co, cmap € C'Map 
cmap(k) = 1 fork >1 acc, a finite subset of I 
pnuml EN, initially 0 value € V 
pnum2Z € I +N, initially everywhere 0 
failed, a Boolean, initially false upg, a record with fields: 
phase € {idle, query, prop}, initially idle 
pnum EN 
cmap € CMap, 
acc, a finite subset of I 
target EN 


Fig. 6: Reader-Writer;: Signature and state 


Input join(rw); Input recv(join) ;, Output join-ack(rw); 
Effect: Effect: Precondition: 
if failed then if failed then failed 
if status = idle then if status # idle then status = active 
if 2 = io then world — world U {j} Effect: 
status <— active none 
else 


status — joining 
world — world U {i} 


Fig. 7: Reader-Writer;: Join-related transitions 


Input cfg-upgrade(k); Internal cfg-upg-prop-fix(k); 


Effect: Precondition: 
afailed afailed 
status = active status = active 
upg.phase = idle upg.phase = prop 
cmap(k) EC upg.target = k 
VEEN, £<k: emap(t) AL AW € write-quorums(upg.cmap(k)) : W C upg.ace 
Effect: 
pnuml — pnumi +1 for £EN:£<kdo 
upg <— (query, pnum1, cmap, 0, k) cmap(£) —+ 
Internal cfg-upg-query-fix(k); Internal cfg-upgrade-ack(k); 
Precondition: Precondition: 
failed afatled 
status = active status = active 
upg.phase = query upg.target =k 
upg.target =k VELEN 0 <k: cmap(@)=+ 
VEEN, £<k: upg.cmap() € C Effect: 
=> JR € read-quorums(upg.cmap(é)) : upg.phase < idle 


AW € write-quorums(upg.cmap(é)) : 
RUW C upg.acc 
Effect: 
pnum1 — pnum1 +1 
upg.pnum — pnum1 
upg.phase < transient 
upg.acc — @ 


Fig. 8: Reader-Writer;: Configuration-Management transitions 


Output send((W, v, t, cm, pns, pnr))i,; Internal query-fix; 


Precondition: Precondition: 
failed failed 
status = active status = active 
j € world op.type € {read, write} 
(W, v, t, cm, pns, pnr) = op.phase = query 
(world, value, tag, cmap, pnum1 , pnum2(j)) Vk EN,c EC: (op.cmap(k) = c) 
Effect: => (AR € read-quorums(c) : R C op.acc) 
none Effect: 
if op.type = read then 
Input recv((W, v, t, cm, pns, pnr, m));,i op.value — value 
Effect: else 
if failed then value — op.value 
if status # idle then tag — (tag.seq + 1,1) 
status — active pnum1 — pnum1 +1 
world — world UW op.pnum — pnum1 
if t > tag then (value, tag) <— (v, t) op.phase — prop 
cmap <— update(cmap, cm) op.cmap <— truncate(cmap) 
pnum2(j) — max(pnum2(j), pns) op.acc — @ 
if op.phase € {query, prop} and pnr > op.pnum then 
op.cmap — extend(op.cmap, truncate(cm)) Internal prop-fix; 
if op.cmap € Truncated then Precondition: 
op.acc — op.acc U {j} afailed 
else status = active 
pnum1 — pnum1 +1 op.type € {read, write} 
op.acc — op.phase = prop 
op.cmap <— truncate(cmap) Vk EN,c € C: (op.cmap(k) = c) 
if upg.phase # idle and pnr > upg.pnum then => (AW € write-quorums(c) : W C op.acc) 
upg.acc — upg.acc U {7} Effect: 


op.phase = done 


Input new-config(c, k); Output read-ack(v); 
Effect: Precondition: 
if failed then failed 
if status ~ idle then status = active 
cmap(k) — update(cmap(k), c) op.type = read 
op.phase = done 
Input read; v = op.value 
Effect: Effect: 
if failed then op.phase = idle 
if status idle then 
pnuml1 — pnum1 +1 Output write-ack; 
(op.pnum, op.type, op.phase, op.cmap, op.acc) Precondition: 
<— (pnum1, read, query, truncate(cmap), 0) afailed 
status = active 
Input write(v); op.type = write 
Effect: op.phase = done 
if failed then Effect: 
if status ~ idle then op.phase = idle 
pnuml1 — pnumi +1 
(op.pnum, op.type, op.phase, op.cmap, op.acc, op.value) Input fail; 
<— (pnum1, write, query, truncate(cmap), 0, v) Effect: 


failed — true 


Fig. 9: Reader-Writer;: Read/write and failure transitions 


Input: Output: 


init(v)z,c,2, Uv € V, 4 € members(c) decide(v)x,c,4, v € V, i © members (c) 


fail;, 1 € members (c) 


Fig. 10: Cons(k,c): External signature (Cons(k,c) is an external consensus service) 


Signature: 


Input: Output: 
join(recon),; join-ack(recon) ; 
recon(c,c’);,c,c’ € C,i € members(c) new-config(c, k);, c€ C,k Nt 
decide(c),,;,¢ € C,k € Nt init(c, c’)k,i, c,c’ € C,k E Nt, t © members(c) 
recv((config,c,k));i, c€ C, kEN*, recon-ack(b);, 6 € {ok, nok} 
i € members(c), 7 € I — {i} report(c);, cE C 
recv((init,c,c’,k));,:,¢,c1 EC, KENT, send((config,c,k))i,;,c€C,k ENT, 
i,7 € members(c), 7 Fi j € members(c) — {i} 
fail; send((init, c,c’,k))i,j,¢,c7 EC, kK E Nt, 
i,7 € members(c), 7 #i 
State: 
status € {idle, active}, initially idle. op-status € {idle, active}, initially idle 
rec-cmap € CMap, initially rec-cmap(0) = co op-outcome € {ok, nok, L}, initially L 
and rec-cmap(k) = | for all k £0. cons-data € (Nt — (C x C)), initially everywhere L 
did-new-config C N+, initially 0 did-init C N*, initially 0 
reported C C, initially 0 failed, a Boolean, initially false 


Fig. 11: Recon;: Signature and state 


Input join(recon); 
Effect: 
if failed then 
if status = idle then 
status <— active 


Output join-ack(recon); 


Precondition: 
failed 
status = active 
Effect: 
none 


Output new-config(c, k); 
Precondition: 
failed 
status = active 
rec-cmap(k) =c 
k € did-new-config 
Effect: 
did-new-config — did-new-config U {k} 


Output send((config, c,k));,; 
Precondition: 
failed 
status = active 
rec-cmap(k) =c 
Effect: 


none 


Input recv((config, c, k))j,4 
Effect: 
if failed then 
if status = active then 
rec-cmap(k) — c 


Output report(c); 
Precondition: 
afailed 
status = active 
c = rec-cmap(k) 
Vl >k: rec-cmap(é) = L 
c & reported 
Effect: 
reported —— reported U {c} 


Input recon(c, c’); 
Effect: 
if failed then 
if status = active then 
op-status — active 
let k = max({@ : rec-cmap(€) € C}) 


if c = rec-cmap(k) and cons-data(k + 1) = L then 


cons-data(k + 1) — (c,c’) 
op-outcome — L 

else 
op-outcome — nok 


Output init(c’)¢ 6,4 
Precondition: 
failed 
status = active 
cons-data(k) = (c, c’) 
ifk >1 then k —1 € did-new-config 
k € did-init 
Effect: 
did-init — did-init U {k} 


Output send((init, c,c’,k))i,5 
Precondition: 
failed 
status = active 
cons-data(k) = (c, c’) 
k € did-init 
Effect: 
none 


Input recv((init, c, c’,k));,1 
Effect: 
if sfailed then 
if status = active then 
if rec-cmap(k — 1) = L then rec-cmap(k — 1) —c¢ 
if cons-data(k) = L then cons-data(k) — (c,c’) 


Input decide(c’)p 6,4 
Effect: 
if failed then 
if status = active then 
rec-cmap(k) — c! 
if op-status = active then 
if cons-data(k) = (c,c’) then op-outcome — ok 
else op-outcome — nok 


Output recon-ack(b); 
Precondition: 
failed 
status = active 
op-status = active 
op-outcome = b 
Effect: 
op-status = idle 


Input fail; 
Effect: 
failed — true 


Fig. 12: Recon;: Transitions. 


